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We consider the behavior of spatial point processes when sub¬ 
jected to a class of linear transformations indexed by a variable 
T. It was shown in Ellis [Adv. in Appl. Probab. 18 (1986) 646-659] 
that, under mild assumptions, the transformed processes behave ap¬ 
proximately like Poisson processes for large T. In this article, under 
very similar assumptions, explicit upper bounds are given for the 
ah-distance between the corresponding point process distributions. A 
number of related results, and applications to kernel density estima¬ 
tion and long range dependence testing are also presented. The main 
results are proved by applying a generalized Stein-Chen method to 
discretized versions of the point processes. 

1. Introduction. Let D\,D 2 £ N = {1,2,3,... } and D = D\ + D 2 - Con¬ 
sider a point process £ on R D = JR- 01 x JR" 02 , which has expectation measure 
u and meets three conditions, namely, absolute continuity of v with a mild 
restriction on the density, an orderliness condition in the JR- 01 -directions and 
a mixing condition in the R^ 2 -directions (formal versions of these conditions 
can be found at the end of this section). Let rj be a Poisson process with 
the same expectation measure and let 6p ■ R^ —*► R"° be the linear trans¬ 
formation that stretches the first D\ coordinates by a factor w(T ) 1 /- Dl and 
compresses the last D 2 coordinates by a factor T 1//£>2 , that is, for T £ R, 
T > 1, we set 

0 r (s,t) := (^w(T) l / Dl s, —2-^— t^ for all (s,t) £ R Dl xR° 2 =R D , 

where w(T) —* 00 and w(T) = 0(T) for T —> 00 . In particular, we usually 
write Op instead of Op if our stretch factor is T l / Dl . 
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Most of the time we will restrict our transformed processes and 
to a bounded cube J := [—1, 1) D and denote by Jt '■= 0^ 1 (J) the pre-image 
of J, but sometimes the bigger cuboids Jt := Ot(Jt) = [— ■ i^T)) 1 ^ 1 ) 01 x 

[—1, 1) D2 instead of J are more useful. 

A consequence of what Ellis (1986) showed is that, for bounded measur¬ 
able functions fx : J —► M with ||/t||oo = 0{y/w{T)/T), the distributions of 
Sj It d^OTp 1 ) and fj fx get more and more alike as T —> oo; or, more 

precisely, that the difference between their characteristic functions converges 
uniformly to zero on every compact subset of K as T —► oo. Therefore, there 
is hope that j), £(r/0^ 1 | j)) can be shown to be small for large T 

if we choose for d a probability distance between distributions of point pro¬ 
cesses which metrizes a topology that is equal to or not too much finer than 
the weak topology (i.e., the topology of convergence in distribution). 

Our choice for d will be the ^-distance [see Barbour, Holst and Janson 
(1992), Section 10.2], which, besides meeting the aforementioned require¬ 
ment, has a number of other useful properties; it is rather easy to handle, 
and bounds on d 2 (£(£i), £(£ 2 )) for point processes £i, £2 imply bounds on 
|E/(£i) — E/(£ 2 )| for a number of desirable functions /. The ^-distance can 
be constructed as two Wasserstein distances, one on top of the other, in the 
following way. Consider a compact set X C and write M. p for the space 
of point measures on X. Let do be the usual Euclidean distance on R- 0 , but 
bounded by 1 , and T\ :={k:X — > R; \k(x\) — k{x 2 )! < (Iq{x\, £ 2 )}- Define 
the di-distance (w.r.t. do) between point measures pi,p 2 £ M p by 


1, 


di(pi,P2) ■■= < 


1 —r sup 
I Pi I fee-Ai 


k dpi — k dp2 


if |pi| + N, 
if |pi| = N > 1, 
if |pi| = N =0, 


where \pi\ := p%{X) < 00 . It can be seen that (M p ,di) is a complete, separa¬ 
ble metric space and that d± is bounded by 1. Furthermore, the Kantorovich- 
Rubinstein theorem [see Dudley (1989), Section 11.8] when |pi| = |p 2 1 =: n > 
1 yields that 


( 1 . 1 ) 


di(pi,P 2 ) = min 

7T e&n 


1 U 

U i=1 


where S n is the set of permutations of {1,2,..., n}. Now let J ~2 := {/ : A4 P —> 
R; |/(pi) — /(/0 2 )| < d\(pi, P 2 )} and define the d 2 -distance (w.r.t. do) between 
probability measures P and Q on A4 P (distributions of point processes on 

x) by 


d 2 (-P, Q) := sup 
f&T 2 


fdP 


fdQ 
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By the Kantorovich-Rubinstein theorem, one obtains that 

(1.2) d 2 (P,Q) = min Edi(6,6) 

£2 

[the minimum is attained, because (M p ,di) is complete, see Rachev (1984)]. 
Furthermore, because of the bound on the di-distance, the ^-distance can 
also be interpreted as a variant of a bounded Wasserstein distance (see 
below). Hence, Theorem 11.3.3 in Dudley (1989) yields that d 2 metrizes the 
weak convergence of point process distributions; or, in other words, for point 
processes £,£i,£ 2 : ■ • • on X , we have 

(1.3) iff d 2 (£(£„), £(0)^0, 

where the convergence in distribution for point processes is defined in the 
usual sense [see Kallenberg (1986), Section 4.1], The fact that is crucial here 
is that, for do as defined, the topology generated by the metric d\ on A4 P is 
equal to the vague topology, which is used for the definition of convergence 
in distribution for point processes. 

d 2 is the distance that we are mainly interested in, but we will also deal 
with two other probability distances; namely, on the one hand, the total 
variation distance between distributions and /r 2 on Z+, which is defined 
as 


d T v(n i,// 2 ):= sup \iii(A) - h 2 {A)\ 
Acl+ 

and can be equivalently written in the form 

(1.4) dTvQH,/k2) = min P[X x ^ X 2 ]; 

Ai~/xi 


and, on the other hand, the bounded Wasserstein distance between distri¬ 
butions jli and /i 2 on M, which is defined as 


dBw(/R,/l 2 ) := sup 

f & T BW 


[ fdjli- f f djj,2 
J M. J M 


where 


•£bw :={/:M->R; \f(x) - f(y)\ <\x - y\ and |/(x)| < \ for x,y € X}, 

the set of Lipschitz continuous functions with constant 1 that are bounded by 
1. For equivalent expressions and properties see Barbour, Holst and Janson 
(1992), Appendix A.l for the total variation distance and Dudley (1989), 
Section 11.3 for the bounded Wasserstein distance. 
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It will be the main goal of our endeavors to find upper estimates for 
the distance d2(C(^9^ r 1 \ j), C{p9^, l \ j)) (see Section 2.2), but explicit up¬ 
per bounds will also be computed for ^tv (J)) , £(776^ 1 (J))) (Sec¬ 
tion 2.3), d 2 m^6T 1 \j T )^(v^T 1 \j T )) (Section 2.4) and d 2 {C,{^ 1 \ j t ), Po(z/| j T 
for an appropriate T-independent measure i/ on R-° (Section 2.5). Through¬ 
out the article we use Po(z/) to denote the Poisson distribution with param¬ 
eter z/ if z/ is a positive real number and to denote the distribution of the 
Poisson process with parameter measure z/ if v' is a boundedly finite mea¬ 
sure. 

In Section 3 we present some applications of our results. Most importantly, 
we calculate an upper bound for the bounded Wasserstein distance between 
the distribution of a kernel estimate of the density of v at a certain point 
and the actual value of the density at that point. Furthermore, we briefly 
describe an application to testing for long range dependence. 

Apart from the paper of Ellis (1986), which provided the initial motiva¬ 
tion for many of the theorems in this article, stretched point processes have 
also been investigated in the context of light traffic analysis for queues and 
in other, similar topics: see, for example, Borovkov (1996) and the refer¬ 
ences therein. These authors, however, were interested in the quite different 
question of finding asymptotic expansions for the expectation of functionals 
of purely stretched marked point processes, which vanish in the limit on 
every compact set; our procedure, in contrast, leads to point processes with, 
essentially, a stable or increasing number of points in every compact set. 

We conclude this section by having a detailed look at the three conditions 
for the point process £. 

Condition 1 (Absolute continuity of the expectation measure). Let 
H = Hi <g) ji 2 , where n\ := X Dl is the Lebesgue measure on M Dl , and either 
H2 '■= A D2 is the Lebesgue measure on R-° 2 or p,2 '■= is the counting 
measure on + ilcK® 2 . 

Then we require that v <C p, with a Radon-Nikodym density p, such that 
k E R + exists with 

kt ■= sup p( s,t) < k for all T > 1. 

(s,t)eJ T 

In the same way, we choose t E M+ with 

lt '■= inf p( s,t) > l for all T > 1. 

(s,t)eJ T 

(For the asymptotic result it is enough, of course, to assume both statements 
only for all T bigger than some Tq > 1.) 
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Condition 2 (Orderliness). There is a continuous function a : M + —► M + 
with d( 0 ) = 0 , such that for every rectangle C := [a, b) x [c, d) with a, b E 
M Dl , a < b, and c, d E W ° 2 , c < d, we have 

E[(^(C)) 2 1{ ? ( C )>2}] < va(v), 

where 

v=v(C)=n i([a,b))^ 2 ([c,d +1)). 

For the third condition, there are different versions that can be considered. 
According to the type of mixing we are interested in, we write this condition 
as 3x, where x E {/3,p,ip}: 

Condition 3 x (x-mixing property). For every interval [a, b) C M Dl , a < 
b, there is a decreasing function /3 := /3 a t>: M + —> R + with the two following 
properties: 

(a) $(u) = 0 (^ 72 ) for u -> 00 . 

(b) If c, d E M. D2 with c < d, t E M+ and the a-fields T \ nt and jF ext are 
defined as J ~\ n t . ^IClfa.bJxfcM)) and . cr(d[a,b)x[c— 4 i,d+ii) c )j then 

Z^int^ext) <P(t), 

where x is one of the three mixing coefficients /3,p or ip with 

^(•Tint^ext) :=Eesssup|P(5|jU int ) -P(B)|, 

B(zJ -gxt 

P^int^ext) := sup |corr(X,y)|, 

X£L 2 (T int) 
ueL 2 (^ext) 

<p(^iint^ext) := sup |P(B|A) - P(B)|. 

B(zJ -gxt 

In the following we suppress the indication of the interval [a, b) and write 
simply /?. The corner points a and b are to be chosen appropriately; for 
example, a = - sup T > 1 (^^y ) 1/£ ’ 1 1 , b = sup^^^E .) 1 /^ 1 • 1 is always an 
appropriate choice. 

No further explanation is needed for the first condition. It simply states 
the absolute continuity of the expectation measure with respect to what is 
basically Lebesgue measure, with a mild condition on the density. The fact 
that we admit the counting measure for the I? 2 -part of the reference measure 
p allows us to apply our future estimates to (mixing) sequences of certain 
JR- 01 -valued point processes. In order to simplify certain formulas, we will 
always tacitly assume that T E {n D2 -,n E N} if P 2 is the counting measure. 
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The second condition is a form of orderliness in the M^ 1 -directions. For a 
detailed account of orderliness, see Daley (1974). For what we are interested 
in here, it is enough to understand that the upper bound for E[(£(C)) J 11 {£(( 7 )> 2 }] 
implies that 

4P[£(C)>2] <va(v), 

and that Condition 2 implies the simplicity of £ (i.e., P[£({x}) < 1 Vs E 
R- 0 ] = 1). The latter implication is due to Theorem 2.6 in Kallenberg (1986). 

The various versions of the third condition are mixing conditions of dif¬ 
ferent strength. It can be seen [Doukhan (1994)] that 

0(B,C)<tp{B,C), 

for arbitrary u-fields B, C C T on some common probability space (D,.F,IP). 
Thus, the concept of cp-mixing is the strongest of the three, followed by the 
/3-mixing and p-mixing concepts, which are not generally comparable with 
each other, although from an empirical point of view, /3-mixing often turns 
out to be the stronger of the two. Two mixing concepts that are not treated 
here are a-mixing, which would be weaker, and '0-mixing, which would be 
stronger than any of the three mentioned concepts [see Doukhan (1994)]. The 
kind of mixing used in Ellis (1986) is p-mixing. However, it is important to 
notice that we need a stronger mixing condition, in the sense that the set 
underlying the cr-field J- ex t may enclose the set underlying the er-field J- m t 
from all of the 2 D 2 possible directions of the M^ 2 . As partial compensation, 
the order we need for the convergence of our mixing coefficient to zero is 
only half the order that was needed for Ellis’ result, and what is more, we 
could actually manage with a mixing condition where the cr-fields J - ex t and 
T i n t are quite a bit smaller (namely, generated by the numbers of points of £ 
in the corresponding discretization cuboids that we will need for the proof). 

2. The main results. The results given within this section have some¬ 
what similar flavor, and their proofs all follow the same path; first discretiz¬ 
ing the point processes and then applying a local Stein theorem. An outline 
of this method can be found in Section 2.1; thereafter, in Sections 2.2-2.5 
the different results are presented. A detailed, self-contained proof is given 
only for Theorem 2.A; for the other statements the necessary adaptations 
are given. 

2.1. The approach. All statements in Section 2 are about upper bounds 
for distances between the distribution of a transformed £-process and the 
distribution of a transformed Poisson process (or a function of the respec¬ 
tive process, as in Section 2.3). For the sake of clarity of presentation, we 
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formulate the ideas of the proof only for d,2(C{£,0 T 1 \j),£(r]0 T 1 \j)). However, 
except for the obvious changes in notation (like writing £I J 7 . instead of 
£6^ 1 | j in Section 2.4), the arguments presented here can be applied literally 
(or almost literally in the case of Section 2.3) to calculate the presented 
upper bounds for any of the distances appearing in this section. 

As mentioned before, our basic strategy of proof is to discretize 
and rjOTp 1 (in general, the point processes involved) and then apply an esti¬ 
mate, obtained by a generalized version of the Stein-Chen method, to the 
discretized point processes (in fact, the classic Stein-Chen method will be 
enough for Section 2.3, where only the numbers of points are involved). The 
corresponding estimate can be found in the Appendix. 

The discretizations are carried out as follows. For every T > 1 and for 
h(T) > 1, set n\ := |"/i(T) 1 /' Dl ~| — 1 and n 2 := |"T 1 /' D2 ~| — 1, where \x\ denotes, 
for any i£l, the smallest integer z > x. We subdivide Jt into smaller 
“discretization cuboids” Cki with lengths 1 in the R^ 2 -directions and widths 
(w(T)h(T)) 1 ' lD i * n -directions, whenever the C*ki are not too close to 

the boundary of Jt- Here h(T) can be thought of as order of the number of 
discretization cuboids in the R^ 1 -directions [there are 2\h(T) 1 ^ Dl ~\ in every 
dimension of R^ 1 ]. To be more precise, we set, for every T > 1, 


Ckl :=c£P 

( Dl 

-in 

\r =1 


n\ Ay — 1 

(w(T)h{T)) l / Dl + [w(T)h{T)) l / Dl ’ 

n i Ay 

(w{T)h{T)) l / Dl + (w(T)h(T)) l / Dl , 


D 2 

x n [-™2 +(is 

s =1 


1 ),-ri 2 + ls) 


n Jt 


for all k= (ki,k 2 ,-..,kD 1 ) € { 0 , 1 ,..., 2ni + l} Dl and 1 = (h,h,-■ ■ ,Id 2 ) e 

{0,l,...,2n 2 + l} D2 , so that Jt = Uk,i^kP- Note that in order to reduce 
the complexity of presentation, we will make use of simplified notations 
for multi-indices that should be obvious in their meaning. For instance, we 
write, in short, Z)k=o +1 a k instead of Z)kTfctv..fcr=o ° k or k G {0,1,..., 2ni + 
1} instead of k e {0,1,..., 2n\ + l} Dl . Also, where not stated otherwise, 
the ranges of the indices in expressions like i or Uk I are gi ven by k E 
{0,1,..., 2ni + 1}, 1 E {0,1,..., 2n 2 + 1}. Some more notation is needed. 
We denote by «ki the centre of Cki and define in the image space of the 
transformation 0t 

Aki:=A|P 1=^(0 
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=n 


r =1 


n\ k r — 1 

h(Ty/ Dl + h{T ) l / D i ’ 




fcr \ 
h(T)V D i J 


x 


n 


s=1 


n 2 / s ~ 1 

J'l/Di J 1 1/D 2 ’ 


n 2 

j'i/z?2 + 2"i/-D2 y 


for all k, 1 and write pkl for the centre of iikl [correspondingly, we use i?kl := 
WCtf) and pki in Section 2.4], 

The discretization S of the point process £ is obtained by setting a point 
in the middle of every discretization cuboid Cki which contains any points 
of £. Formally, we set 

4l := 4 i := %(<? k i)>i}> Pkl := E4l for all k, 1, 

VF:=fF( T ):=^/ k i, A:=EW = ^Pkl, 

k,l k,l 


and define S as 


“ : — ^kl^a k i • 

k,l 

The error we make in the transition from 1 1 j to in terms of the 

d 2 -distance (with a slight alteration, the argument holds also for the dxv- 
distance between the numbers of points; see Section 2.3) is small for large 
T, because, on the one hand, the orderliness condition (Condition 2) takes 
care that the probability of two points within the same discretization cuboid 
(and, as a consequence, of any point vanishing in the transition) is small, 
and, on the other hand, we have chosen our discretization in such a way 
that we only have to move points by a dofoist ance of, at most, half a body 
diagonal of a discretization cuboid i4i (f?ki in Section 2.4) in the image 
space, which is small for large T as well. 

As a discretization (at least “in distribution”) of the Poisson point process 
r] , we take 

H := ^ Ckl<Wi i 

k.l 

where £4i are arbitrary independent Po(pki)-distributed random variables 
for 0 < k < 2ni + 1, 0 < 1 < 2n 2 + 1. Again, the error we make in the transi¬ 
tion from r/(9y 1 1 j to H d^, 1 is small for reasons quite similar to those stated 
above for the transition from to E O^ 1 (note that the two discretiza¬ 

tions were not realized in the same way, and that we have to argue a little 
more carefully in Section 2.5, where a limiting Poisson process that is inde¬ 
pendent of T is considered). 
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We then have an indicator point process H with a local dependence prop¬ 
erty (stemming from the mixing Condition 3x ) and a discrete Poisson point 
process with the appropriate intensity measure, so that we are in the posi¬ 
tion to apply the local Stein Theorem A.D for point processes (or, in case of 
Section 2.3, Theorem A.A for sums of indicators), which in each case yields 
the stated result. 

There is one point about the refinement of our discretization that is worth 
noting. In our main p-mixing case we retain the highest possible flexibility 
by introducing the variable h(T). Although it will often turn out to be a nat¬ 
ural and relatively good choice to set h(T) := T, doing so is, in many cases, 
not optimal. The optimal choice of h(T) depends on the specific orderliness 
and mixing conditions that can be obtained for £. The weaker the orderliness 
condition [the slower a(v) goes to zero for v —> 0], the higher the optimal 
h(T) will be; conversely (and somewhat surprisingly at the moment), the 
weaker the mixing condition [the slower /3(u) goes to zero for u —> oo], the 
lower the optimal h(T) will be. In contrast, no such considerations are nec¬ 
essary for the discretization in the M^ 2 -directions. A discretization cuboid 
length of 1 can easily be seen to be both natural and optimal. A length of 
higher order in T only increases the distance, by which we have to move 
points for discretizing, a length of lower order in T increases the number 
of discretization cuboids without changing the order of the length that the 
orderliness condition “sees” [i.e., without changing w(C'ki) with v as in Con¬ 
dition 2], 

2.2. The d 2 -distance between the point processes. In this section the d 2 - 
distance between the transformed point processes £0y 1 |j and gOffl j is con¬ 
sidered. In all the results we use the notation 0 (/i(T),. .., fj(T)) as short 
hand for 0(max{/i(T),..., fj(T)}). 

2.2.1. Results. 

Theorem 2.A (“The principal theorem”). Suppose that the prerequi¬ 
sites of Section 1 hold, including the Conditions l, 2 and 3 p, and let t>0. 

Then we obtain for arbitrary m := m(T) € and h(T ) > 1 for every 
T> 1; 

d 2 (C(^f 1 \j),jr(7 1 9f 1 \j)) 

( 1 1 T / T \m D * + 1 T / 2 D2 \ 

“ U(T) 1 /^ ’ T i/d 2 ’ og \ W (T)J w(T) ' w{T) a \w{T)h(T))' 

iogt (uu) a ( 2 ~ ) ’ v / ™(A 3 m) 

for T —> oo, 
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where we write log^(x) := 1 + (log(x) V 0) for x > 0. 

For a quantitative form of the upper bound see (2.10) and (2.11) at the 
end of the proof. Note that the powers of 2 and 5 that appear in these 
inequalities have been chosen (for the convenience of calculations) to be 
unnecessarily large and might be dramatically improved. 

One now might ask the question under what conditions the ^-distance 
converges to zero. 


Corollary 2.B (Convergence to zero in Theorem 2 .A). Suppose that 
the prerequisites of Theorem 2 .A hold. Furthermore, suppose that w(T) > 
kT s for k > 0, 5 E (0,1] and that 


a(v) = 0{y r ) 

P(u) = 0(^^) 


for v —> 0 with r > 0, 


for u —> oo with 1 + s > max 


1 — 5 1 + r 1\ 

~5 ~’6j' 


Then 


d 2 {C.{i 9 T l \j),C(r]e T 1 \j)) —>0 forT ->oo. 

Remark 2.C (Convergence to zero, simplified). 

(a) By adjusting m and h{T) to the function f3 it can be shown easily that 
for w(T) x T, the convergence d2(C(^,0f 1 \j), £(r]df 1 \j)) —> 0 holds under the 
general prerequisits of Theorem 2.A. This is consistent with Corollary 2.B 
for 5=1 (note that the requirements for the functions a and $ are a bit 
stronger in Corollary 2.B). 

(b) From Corollary 2.B follows that for arbitrary 6 £ (0,1] and for r > 
1 + s > |, we have d2(£(£,0f 1 \j),£(r]0f 1 \j)) —> 0 for T —> oo. These 

simpler, but stronger requirements on the functions a and (3 reflect the case 
where we refrain from adapting h(T) to the concrete problem and simply 
set h(T ) = T. 

In the principal Theorem 2.A, it may seem a little unsatisfactory that 
our “discretization depth” h(T) in the R^ 1 -directions appears in the term 
yjTh{T)${m), which stems from the mixing condition in the R'° 2 -directions, 
and that, in fact, a finer discretization could increase the overall upper bound 
we get for the ^-distance. Whereas it might well be that the factor a//i(T) 
is superfluous, it has not been possible to prove this so far. However, there 
are other ways in which this problem can be, if not remedied, then at least 
circumvented, simply by assuming one of the other two mixing conditions. 
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Theorem 2.D (Other types of mixing). Suppose that the requirements 
for Theorem 2.A are met, with the exception that Condition 3x holds in 
place of Condition 3 p. 

(a) If x is (3, then d2{C{f9f l \j),C{rj9i T 1 \j)) has the same order as that 
stated in Theorem 2.A, except for the term y/Th(T)/3(rn), which is replaced 
by the two terms \JT/w(T)a(2 D /w(T)) and sJw(T)T(3(m); hence [since 
h(T) > 1 was arbitrary ], 

d 2 m9f 1 \ J ),C(r ] 9f 1 \j)) 



(b) If x is ip, then d2(C(f i 9f 1 \j),C(r]9f 1 \j)) has the same order as that 
stated in Theorem 2.A, but the term \jTh{T)(3(m) can be replaced by y/T/w(T)(3(m) 
hence, as above, 


d2{C{f9f 1 \ J ),C(r 1 9f l \j)) 

_J 1 t t f T \ m D2 +1 
\ T i/d 2 ’ 1 °Z \w(T) ) w(T) 



2 D ( 2 m+l) D2 \ 

MT) )’ 



for T —» oo. 


Remark 2.E. Note that in the above theorem, a certain price must be 
paid for the elimination of h(T) in the term that comes from the mixing 
condition: In statement (a) we obtain for our upper bound an order which 
is, in many cases, worse than the corresponding order we get for an optimal 
choice of h(T) in Theorem 2.A; only for sufficiently high D\ is the upper 
bound order from Theorem 2.D(a), in general, better. In statement (b) we 
require a much stronger kind of mixing condition than in Theorems 2.A and 
2.D(a). 

On the other hand, we do not have to require a strictly stronger mixing 
condition in statement (a) and we get a strictly better upper bound in 
statement (b). 
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Example. A typical choice of parameters for illustrating the above men¬ 
tioned points is given by a(y) = v, $(u) = and w(T) = T, whence we 
immediately get 0(T -1 / 3 ) and 0(T -2 / 3 ) as upper bound orders for the d 2 - 
distance under the /3-mixing and (^-mixing conditions, respectively; solving 
a little optimization problem yields the order 0(T _3 ^' Dl+6 ^) under the p- 
rnixing condition, which for D\ < 3 is better and for D\ > 3 is worse than 
the order under /3-mixing. 

2.2.2. Proofs. The following simple lemma will be useful. 

Lemma 2.F. For all k, 1, we have 

" {c “> - 2D "T(r)V)K 2 %r)Mr)) S Pkl S I ' (Ckl) ' 

Proof. The second inequality is immediate, the first one is obtained as 
P(C'kl) - Pkl = E^(Ckl) - PK(C'ki) > 1] 

OO 

= 5>-l)P[£(C kl )=r] 

r =2 

< -E[(C(C , k l)) 2 l{^(C k i)>2}] 

< 2 D 2-2_I_xY 2 D 2 _ \ _^ 

w(T)h(T) V w(T)h(T)J 

by the orderliness condition with v(Cki) < ^Tjjpr) ■ ^ 

Proof of Theorem 2.A. We use the notation introduced in Section 
2.1; in particular, we write 

H:=^/ki5 akl and H := ^ t/ k i^ kl 

k.l k,l 

for the discretized point processes, where t/ k i are independent Po(p k i)- 
variables for 0 < k < 2ni + 1, 0 < 1 < 2n 2 + 1. 

The overall ^-distance can now be split up accordingly: 

(2-1) <d 2 (jC{ZeT l \j),jC(Z9T 1 )) 

+ d 2 (c(~df 1 ),£(mf 1 )) + d 2 (c(Wf 1 ),c(ridf 1 \j)). 

We first take a look at the discretization errors. For the ^-discretization 
we can obtain, via the Kantorovich-Rubinstein equation (1.2), 

d 2 (£(^ 1 | J ),£(H^ 1 )) 

j^t 1 ) 

= ¥.[di(£0 T \j,ziQ t )^{^e)“ 1 (j)=iy( T )}] + 1' 


( 2 . 2 ) 
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The second summand can easily be estimated as follows: 


p[£ 0 +(j) /i+E =p 


LJ{C(Cki) > 2 } 


Lk,l 


<E P ^ C ki)> 2 ] 


(2-3) 


k,l 

1 


^ 7E E [(^( C 'kl)) 2 l{5(C’ kl )>2}] 


k,l 

<-• (2^2.D-\-D2 —2 


T 


w(T) 


a 


2 D 2 


1 


w(T)h(T) 


by the orderliness condition with v(Cki) < 2 D2 ^fJhFT) • 

In order to estimate the first summand in (2.2), we use the representation 
of the di-distance given by (1.1). Let X\,... ,X^ g -i^ be the points of £0+| j 

and Yi,... ,Y W ( T ) the points of S 0 + and suppose w.l.o.g. that they are 
numbered in an optimal way on {£$+(</) = W^}, that is, in such a way 
that Yi is the centre pki of the cuboid i?ki which contains Xi. Thus, by (1.1), 
and since in the transition from £ to 5 we do not move the points any farther 
than half a body diagonal of a cuboid i?ki> 


^1 (£^+ \j > 1 ~ , ^T i )^‘{^0“ 1 ( J)=VF( T )} 


3 — 1 


W CC 


it( t ) 


E ^o(^l)Li) \ tjfg-l (j) = 


(2.4) 


1=1 


{ie~\j)=wm> i} 



1 


< 


h{T) l / D 

1 f \fDl , \ 


+ d 2 


1 


Ti/E>2 


L {£0“ 1 (J)=WT t )>1} 


2 U(T)V^i + t 1 / 02 )' 

whence we get for the total ^-discretization error 

d 2 {c{^\ j),£(£ 0 +)) 


< 


if vdi | vd~2 \ I 221 ^+ 1 ^ 2—2 


2V/i(T) 1 /Ui r i/u 2y / 


+ 2 


T 


w{T) 


a 2 


)U 2 


+?+(T) 


(2.5) 


Next we consider the discretization error for 77 . Let H 7 := )Ck l 0 +kl+ki 
and < 7 ki : = z'+ki)- We split up the error as 

d 2 (£(H0+), £(770+| j)) 

<d 2 (£(H0^ 1 ),£(H / 0^ 1 )) + d 2 (£(H / 0^ 1 ),£(770^ 1 | J )). 

The first summand gives us a little more trouble. Since for any two point 
processes £1 and £ 2 on a compact set X the inequality 

Edr( 6 , 6 ) =E(d 1 (£ 1 ,£ 2 )l {6 ^ 2} ) <P [6 /£ 2 ] 
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holds, it can be seen from (1.2) and the analogue of (1.4) for probability 
distributions on more general spaces [see Barbour, Holst and Janson (1992), 
Appendix A.l] that 

dv(P, Q) < d,Ty(P, Q) 


for any distributions P, Q of point processes on X. Hence, by another ap¬ 
plication of the more general version of (1.4) in the second inequality, 


( 2 . 6 ) 


d 2 (£(H 0 ^ 1 ),£(H / 0 ^ 1 )) < d rv (£(H 0 r 1 ),/:(H , 0 r 1 )) 

< min 
u£>~Po(p kl ),± k)1 

= 5Z dTV ( P °(P k l)> Po (lkl)) 

k,l 

< y> k . ~Pk\) 


k,l 


^ <2‘^‘D-\-D2 —2 


w(T) 


a 2 


)D 2 


1 


w{T)h(T) 


where the last two inequalities follow from Proposition A.C and Lemma 2.F, 
respectively. For the second summand in (2.5), we obtain 


d 2 (/:(H , ^ 1 ),/:(^ 1 |;))<E ( ii(H'^ 1 ,^ 1 |j) 

= E[di(H' drp 1 ,rj9 T 1 \j)t^ H , g -i ^ =rid -i ^}\ 

< 1 / \fD[ \fPh. \ 

“ 2 \h(Ty/ D ^ + TV D 2 ) 


by the same argument that was used in (2.4). So, an estimate for the total 
7 /-discretization error is given by 


d 2 OC(H 0 r 1 ),£(^ 1 | J )) 

< i ( vdi . \/Ph. \ 
“ 2 U(T) 1/Dl + T 1 / D2 ) 


+ 2^d+d 2 -2 


T 


-a 


( 2 D2 


w(T) V w(T)h(T) 


Last, we look at the remaining term d2(C(S0^ 1 ), CCRd^ 1 )), which is per¬ 
fect for the application of a Stein estimate. In the notation of the Appendix 
we write 


r = {0, 1 ,..., 2 ni + l} Dl x {0, 1 ,..., 2 n 2 + 1}° 2 

[accordingly, we write elements of T as (i,j), meaning i £ {0,1,..., 2 ni + 
l} Dl , j £ {0, 1,... , 2 n 2 + 1} D2 ], and for the sets of strongly and weakly de¬ 
pendent indicators, respectively, 


r kl = {(U) e r k i; |j - 1| < m}, 
rid = {(b j) £ r k i; [j — 1| > m + 1}, 
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for every k, 1, where |j — 1| := maxi< s <£) 2 \j s — Z s | and m := m{T) G Z + for 
every T > 1 is chosen arbitrarily. We can assume w.l.o.g. that m < 2ri2 + 1 
[note that for m > 2ri2 + 1 we have e k i = 0, so that (2.9) below is still true]. 
As in the Appendix, we set 

^ki:= E hi, Y M := Ai- 

(bj)er-, (i,j)er» 

From the local Stein Theorem A.D for point processes we know that 
d 2 (C(Ed^ 1 ),C(R9^ 1 )) 


( 2 . 8 ) 


with 


< < 1A 1 + 2log + ( ^ 


A 


1 


E^+kl + Pkl+^"kl + +(lkl^kl)) 

k,l 


+ ( 1 A 1.65-^=) £> kl) 


k,l 


e k i = 2 max | cov(/ k i, 1 B )\. 

Bea(i iy ,( idler-) 

Starting from the right-hand side, most further estimates are very easy. 
First, we have 

1 


Pkl < KCkl) < Kt 


w(T)h(T) 


and 


2ni+l (l+m)A(2ri2+l) 


EZ kl = E E Pij<«T[( 2 ni + 2) Dl (2m + l) D2 - 1] 

i=0 j=(l—m)V0 

furthermore, by the mixing condition, 


w(T)h(T) ’ 


(2.9) 


ekl = 2v / Pkl(l ~Pk\) max v/lP[-B](l - F[B])\ corr(/ k i, t B )\ 




(3(m); 


and, by Lemma 2.F, 

A = E^ki - Ef^ki) - 2 D2 ~ 2 


k,l 


k,l 


1 


a 


= (u(J T ) - (2m + 2) Dl {2n 2 + 2)° 2 
i T - 2 D+D2 ~ 2 a(2° 2 


w(T)h(T) 
2 d 2- 2 


2° 2 


-a 


w(T)h(T) 

<2 D2 1 


VO 


> 2 


D 


w(T) 


w(T)h(T) V w(T)h(T) 
11 VO, 


VO 


w(T)h(T) 
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whence we get a “magic factor” estimate of 


with 
e(T) := 


i i fc(r) 

-<(1 + £ (T)) 2D ^ t . 


1 _ 2 D+D2 ~ 2 —a(2° 2 - 


l t \ w(T)h(T ) 


-l 


- 1 , 


oo, 


if (1 — •••)> 0, 
otherwise, 


an expression of order 0(a(2 D2 )) for T —> oo, provided that i > 0. 

For the remaining term, E(/ki^ki)j a little trick is required. We subdivide 
the set T = {0,1,..., 2n\ + lj^ 1 x {0,1,..., 2ri2 + 1} D2 along the last D 2 di¬ 
mensions in Z? 2 -cube sections of extension 2m + 1 in every dimension (except 
for possible left over cuboids), and look at the individual sections separately. 
For s = (si, S 2 , • • • ,sd 2 ) G {1,2,..., |~ 2 m+i ll' 02 ; set f° r the sth section, that 
is, the section containing the syth collection of 2m + 1 numbers in the j th 
coordinate, 


c (1) (s) :=c (1) (s,m) := (c^ (s),..., c^\ (s)) 

:= ((si - 1)(2 m + 1),..., (s D2 - l)(2m + 1)), 


which is the “lower left” corner index (the multi-index that is in each coor¬ 
dinate minimal among all indices belonging to the sth section), and 


c (2) (s) :=cM(s,m) := (cj 2) (s),... ,cg](s)) 

:= ([si(2m + 1) - 1] A (2n 2 + 1),..., [sd 2 (2?ti + 1) - 1] A (2n 2 + 1)), 


which is the “upper right” corner index (the multi-index that is in each 
coordinate maximal among all indices belonging to the sth section). Fur¬ 
thermore, we set 



2ni+l [c( 2 )(s)+m] A( 2 n 2 +l) 

U U Op 


i—0 j=[c( 1 ) (s)—m] VO 


the subset of Jt that naturally belongs to the m-neighborhood cube of the 
sth section. Using our usual multi-index notation and index range convention 
for sums, we now obtain for the remaining term 

E E ( J ki^ki) 

k,l 


( 2711 + 1 2712 + 1 2711+1 (l+77l)A(27l2 + l) 

£ £ £ £ 4i/ij 

k=0 1=0 i=0 j=(l—m)V0 

(i j)^(k)l) 
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( f(2ri2+2)/(2m+l)] 

< E E 

l S—1 


/2ni+l c( 2 )( s ) 2ni+l [c( 2 )(s)+m]A(2n2+l) 

( ^2 ^2 ^2 ^2 4:iA) ) 

V k=0 l=c( 1 )(s) i=0 j=[c( 1 ) (s)— m] VO 

(U)tKM 


2 } 


r r(2n 2 +2)/(2m+l)l /2ni+l [c( 2 )(s)+m]A(2n 2 +l) \ 2 

- E i ^2 ( J2 5Z Ai) 1 {u^ m) )> 2 } 

\ i=0 j=[c( 1 ) (s)—m]V0 ' 


k s=l 

r(2n 2 +2)/(2m+l)] 


< E E[«(DU»)) 2 l {5(D (», ft2) ] 

S=1 

< 2 D+D2 (T 1 /^ 2 +m+ l) D2 —^—a(2 D {2m + 1)° 2 ——) 

w(T) V w{T)J 


by the orderliness condition with v(Ds n ' > ) < 2 D (2m + l) D2 ^fy- 

All that is left to do now is to combine the various estimates for the right- 
hand side terms of the Stein inequality (2.8). Then, adding the discretization 
errors and setting 


L(T) := 1A 


2(1 + e(T)) 


MTV 

2 d l t T_ 


(l + 21„ g+ (2-V T ^T_)) 


yields for the overall ^-distance 


VDi , 


< 




2D+2 Dx 2 T(2m + 1) D2 


_)_ <22D+D2 — 1 _ 


2°2 \ 


(w(T)Y 


(2.10) ' “ w(T) a \w(T)h(T) j 

, (t) ,d + d, T 1/i>2 +mtl) 6 A(!m+A\ 

+ L{T>2 w(T) A w(T) J 


+ n^ 1 + s(T)^y°^^ 


h(T) 


Tf3(m). 
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For i > 0 and preferably T large enough, we get the rougher, but less nasty 
looking upper bound 

d 2 (/W 1 |j),/W 1 |j)) 


< 


'/Di , \[Th 


( 2 . 11 ) 


h(T) l / Dl T l / D2 

+ 2 D+2Dl+2 — {l + e(T)) log T (2°-^-^—) 
l \ w(T)J 


_! T \(2m + l) 


D 2 


I 2 2D +D 2 -l m 


J-&( 2D2 ) 

w{T) \w(T)h(T)) 


+ 2 D2+2 5 D2 ^(l + e(T))log T (2 D ~ l f 


+ 22 D + 1 J-^/l + e(T)^/Th(T)P(m), 


T 


w(T) 


a 2 


w(T) 

D (2m + 1 ) D2 

w(T) 


which is of the required order. □ 

Proof of Corollary 2.B. For T > 1, we have to find h(T) > 1 and 
m := m(T) £ Z + , such that all six terms on the right-hand side of the equal¬ 
ity in Theorem 2.A go to zero as T —> 00 . We set h(T ) = T q and m := [T®], 
with q > 0 and 0 < x < . Thus, 

1 


log 1 


/i(T) 1 /^ 
m fl2 + 1 


0 , 


1 


JR/-D 2 


0 , 


log 1 


T 


w(T) 


a 


w(T)) w(T ) 
2 D (2m + l)^ 2 


0 and 


0 ; 


so the only two terms we have to worry about are 


2°2 \ 


w(T) a {w(T)h(T)) 


= 0(T 


1 —S—Sr—qr 


and 


yjTh(T)P(m ) = o(T 1/2(1+9 “ (1+s)D23:) ), 


which both converge to zero if there exist q > 0 and 0 < x < such that 
1 — 5 — 5r 


q> 


and q < (1 + s)D 2 X — 1. 


This last is true provided that 

(1 + s)<5 — 1 > max ^ 


1 — 5 — Sr 


1 0 11 
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whence we obtain the statement. □ 


Proof of Theorem 2.D. Since the mixing condition is used only once 
in the proof of Theorem 2.A, namely, in (2.9) for obtaining the upper bound 
of the e k i from the Stein estimate, we can simply transfer the proof and 
re-calculate this upper bound under our new mixing conditions. 

(a) Let 1 E {0,1,..., 2n 2 + 1} D2 be fixed, set C.\ := U k =o +1 Ckb and define 

4!!, := (Jii; i € {0,1, • • •, 2 n 1 + l} Dl ), Enl := a(X^), 

'■= (An (bj) € r ki) regardless of k, : = 

Note that 7$ C E2 := <r(f IcJ and Ext C Ext := . c ), regard- 

less of k. It is seen for every k E {0,1,..., 2ni + l} Dl that 
e k i = 2 max | cov(/ k i, 1 B )| 

= 2 max |P[B n {4i = 1}] - P[5]P[7 kl = 1] | 

< 2 max |P [B D {X® = x k }] - F[B]F[X® = x k ]| 

Ref' 1 ) 


+ 2 max 


Bar 


(i) 


5n{/ k i = i}n^/ n >2 


-P[B]P 


{/ k i = i}n^/ ; ,>2 


where x k is the element of {0, l}^ 0,1, '”’ 2ril+1 J' Dl , which has a 1 in the kth 
and a 0 in every other component. We denote the first summand by A k i, 
the second by i? k i and look at the sums over k separately. For the T k i-sum 
we obtain 

2ni+l 2ni+l 

E ^ kl = 2 E max |P[B|x2 = x k ]-P[S]|P[x2 = x k ] 
k=o k=o Bar" 

<2E( max |P[B|X®] - P[S]|) 
y Bar£ t / 

= 2/3(ES,E£) < 2/3(E5,E£) < 2/3(m), 


where the monotony of the /3-mixing coefficient is immediate if it is written in 
its dual form as a supremum over measurable partitions [see Doukhan (1994), 
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Section 1.1]. For the Ski-sum, the upper bound is obtained by application 
of the orderliness condition: 


2ni+l 

E 

k=0 


2ni+l 


E ^ki<4 ^2 E (4il{£./. 


hl>2F 


k=0 


< 2E[(£(C.i)) 2 1{ 5 (c.,)>2}] 


< 2 


D +1 


w(T) 


a 2 


>£> 


w(T) 


We thus have for the total eki-sum over k the estimate 


2ni+l 


E e k i < 2/3(m)-f 2 d+1 


1 


k=0 


w(T) 


a 2 




1 


w(T)J- 


(b) In the case of the y?-mixing condition, the corresponding estimate is 
very easy. It follows that 


e k i = 2 max | cov(/ kl , 1 b )| 




(i) 


= 4 max |P[S|I kl = 1] - P[5]|W k i = 1] 


< 2 /3(m) 


kt 


h(T)w(T)' 


□ 


2.3. The dTV-distance between the numbers of points. Since for every 
A C the function /a :A4 p — » M + that is defined by /a(p) '■= I[|/o| G A] is 
in T 2 , it follows for any two point processes ^ 1,^2 on a compact set X, that 

|P[ei(Af)G A] -P[6W €A)\ <d 2 (£(£i),£(&)), 

hence, also 

d T v(S(6W),/:(6W))<d2(/:(6)^(6)). 

Thus, the upper bounds we obtained in the theorems of Section 2.2 are also 
upper bounds for (J))). However, using the same 

method as above and making only slight modifications in the proofs, one 
can do a little better. Note that although now we are only concerned about 
numbers of points and not about their positions, we can still improve (but 
possibly also impair, depending on the leading term in our estimate) our 
upper bound by choosing a finer discretization in the R^* 1 -directions. This 
is because the advantage we get from the orderliness condition if we have 
smaller discretization cuboids surmounts the disadvantage of having more 
of them. 
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Theorem 2.G. Suppose that the prerequisites of Section 1 hold, includ¬ 
ing the Conditions 1, 2 and 3 p, and let l > 0. 

Then we obtain for arbitrary m := m(T ) E and h(T) > 1 /or every 
T> 1: 


d T v(A£^V)) ; £M?V))) 

_ o /m D2 + l T 


2 d 2 \ / 2 Z) ( 2 m + l ) D2 


u>(T) ’ w(T) a \w(T)h(T)J ,a \ w(T) 


Th(T)/3(m ) 
forT 


oo. 


Remark 2.H. Of course, all theorems stated in Section 2.2 have their 
equivalents for the ^Tv-distance between the distributions of the numbers of 
points. The corresponding upper bounds can simply be obtained by leaving 
out the log ^ -terms, as well as the terms 

hfT) RdT and tCdT 

Note, however, that the conditions in Corollary 2.B for convergence to zero 
of the principal upper bound remain unchanged. 


Proof of Theorem 2.G. Although our task now seems to be quite 
different, we can proceed exactly as we did in the proof of Theorem 2.A. 
First, we split up the distance as 

= dTv(C{f{JT)), Po(V( J T ))) 

<<hv(C(S{J T )),C(W)) 

+ d TV (C(W), Po(A)) + d TV (Po(A), PoM Jt))). 

Here the two discretization errors can be estimated very easily. By the or¬ 
derliness condition, we obtain 


d T vm(J T )),C(W)) < P[£(Jt) + W] 

= P U^( C ki)>2} 


k.l 


- lX! E [(£( C kl)) 2 %(C kl )>2}] 


k,l 


< 2 2 D+D 2 -2 


w(T) 


a 


2 °2 


1 


w(T)h(T) 
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and by Proposition A.C, 


dTv(Po(A), Po(^( Jt))) 

£m,n ( 1 ’^’7tbi) |A “' (jT)l 
= ( 1A 7tm)S (KCkl)_! ’ kl) 


< 1A 


2 D / 2 V^ 



o2£>+D 2 -2 1 

) ^ w(T) 


a 


2 D 2 


w{T)h(T) 


As for the remaining term, Jtv(£(W’),Po(A)), we can proceed exactly as 
we did with )), with the only difference that now we use 

the classical local Stein-Chen Theorem A.A. Thus, 


d T vOC(WO,Po(A)) 


< min 



H(Pkl + PklE^kl + E(J k iZ k i)) + min 

k,l 



with 


e k i = 2 max | cov(/ kl , 1 B )|. 
se«r(J u; (ij)Gr“) 

All notation has exactly the same meaning as it had in the proof of Theorem 
2.A, so except for the logarithmic factor in front of the first sum, and the 
constant 1.65 in front of the second, we get exactly the same upper bound 
for dTv(£(W),Po(A)) as we did for d2(C(E9f 1 ), H^ 1 )). 

Assembling of all the different pieces yields the result claimed. □ 


2.4. Results for measure preserving transformations 9 t■ When we con¬ 
sider a stretch factor rc(T) 1// ' Dl =o(T 1 / z?1 ), the expected number of points 
of the transformed process fOf 1 contained within the fixed cube J goes to 
infinity as T —> oo if l > 0, which for some applications is not desirable (e.g., 
if we want to approximate f0f l \ j by a Poisson process that does not depend 
on T, see Section 2.5). We therefore formulate another theorem in this sec¬ 
tion, which deals with the case where we adjust the volume of the cuboid J 
to the volume of the cuboids Jr, and thus produce space for the additional 
points. 

In this regard, let 6t and Jt, defined as in Section 1, be our substitute for 
the transformation 9t and our enlarged version of the cuboid J, respectively. 
We then obtain the following result, where once more the quantitative form 
of the upper bound can be found at the end of the proof. 











BOUNDS FOR POINT PROCESS APPROXIMATIONS 


23 


Theorem 2.1. Suppose that the prerequisites of Section 1 hold, includ¬ 
ing the Conditions 1, 2 and 3 p, and let l> 0. 

Then we obtain for arbitrary m := m(T ) E Z+ and h(T ) > 1 for every 
T > l: 


d2mef 1 \j T ),c(r ] e^\j T )) 


-ii 


= 0 


T \l/£>i 

MT)J 


h(Tyi D 1 ’ TV d 2 
° 2 + 1 


S U(T)>/ w(T) ’u;(T) \w(T)h(T)J' 
log T 


T \ /2 D (2m + l) D2 

(r)M «KT) 


Th(T){3(m ) 

/or T —> oo, 


which is the same order as in Theorem 2. A, apart from the factor (T/W(T)) 1 /^ 1 . 


Proof. For a large part we can adopt the proof of Theorem 2.A. We 
use the same notation and the same discretization as we did there, replacing 
only 6t by 9t and J by Jt- First note that there is no change at all for the 
estimate of the Stein term, now written as d2(C(E9f 1 ), CfROf 1 )), because in 
the Stein estimate only objects in the pre-image of 9t have to be considered 
(the Stein estimate does not take into account the distances between the 
points!). 

But the changes for the estimates of the approximation errors are not ex¬ 
actly huge either: As we have seen in the proof of Theorem 2.A, these errors 
can be split up into two additive parts, one stemming from the fact that the 
original and the discretized point process need not have the same numbers 
of points in every discretization cuboid [see (2.3), resp. (2.6), in the proof of 
Theorem 2.A] and one stemming from the fact that even when we have the 
same numbers of points in every discretization cuboid, their positions are, 
in general, a bit shifted [see (2.4), resp. (2.7)]. From those two parts only the 
second is affected by the transition from 9t to 9t and from J to Jt (inas¬ 
much as the discretization cuboids in the image space get a little bigger), 
because for the first, we have to deal once more only with objects in the 
pre-image of 0t- A short calculation taking into account the above consid¬ 
erations [reproducing inequalities (2.4) and, accordingly, (2.7)] provides as 
upper bounds for each of the discretization errors d2(C(f9f 1 \ j ), C(E9f 1 )) 
and d 2 (C (H0^ 1 ), C{r)6f} | j T )), 


1 

2 


t \ i/Di vm VDf_\ 

w(T)) /i(T)VUi + r i/u 2 j 


+ 2 2D+D2 -2——d( 2° 2 - 
w(T) 


1 


w{T)h(T) 
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Thus, we obtain as possible upper bounds for the overall ^-distance those of 
(2.10) and (2.11) with h( ffi Dl replaced by (^y) 1/jPl h( ^D 1 , which yields 
the required qualitative estimate. □ 


Again we can formulate versions of the other results of Section 2.2 with 
only slight (and very obvious) changes; in particular, we get the following: 


Corollary 2. J (Convergence to zero in Theorem 2.1). Suppose that the 
prerequisites of Theorem 2.1 hold. Furthermore, suppose that w(T ) > kT s for 
k > 0, 6 £ (0,1] and that 


a(v) = 0(y r ) 

/3(u) = o(^ ( 1+s)D2/2 ) 

Then 


for v —» 0 with r > 0, 


for u —* oo with 1 + s > max 


1-Sl + r 2 -6\ 
5 r ’ S ) 


d 2 (^(^ T 1 |j T ),^M T 1 |j T ))—>0 for T-> oo. 


Note that under the /3-mixing or the (^-mixing condition, no changes in 
the respective upper bound order obtained in Theorem 2.D are necessary. 


2.5. Results for a fixed limiting process. So far we have only examined 
approximations of the transformed process fOf 1 (resp. fOf 1 ) by a Poisson 
process which has the expectation measure vdf 1 . Of course, this implies that 
the expectation measure may (and, unless it is a constant multiple of the 
Lebesgue measure, does) change as T tends to infinity: The approximating 
Poisson process, in general, will not be stable. One might therefore ask under 
what circumstances it is possible to approximate the transformed /(-process 
by a fixed Poisson process, whose distribution does not depend on T, and 
what loss in terms of the ^-distance one has to face. 

First of all, the correct T-independent intensity measure for our new Pois¬ 
son process has to be found. Clearly, for l > 0, using the transformation Oj- 
with a stretch factor w(T) = o{T) is unnatural, because in that case the 
expected number of points of fOf 1 contained in J goes to infinity, whereas, 
of course, for any fixed Poisson process, the expectation of the number of 
points in J is always finite. So the natural choice for general w{T) is the 
measure preserving transformation 9t , together with the enlarged cuboid 
Jt from Section 2.4. 

For the following heuristics we ignore the fact that p .2 might be a counting 
measure. Then, restricted to the cuboid Jt for T relatively large, the measure 
v with density p with respect to \ D should be relatively “close” to the 
measure v' \= p(0)X D , provided that p is constant in the M .° 2 -directions 
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[hence, the notation p( s) = p( s, t) for all s G R- 01 , t G R^ 2 ] and that p satisfies 
a regularity condition in the R^ 1 -directions at 0. Thus, restricted to Jt , 
uOf 1 should be close to u'Oif 1 [which is again p(0)X D , hence, not dependent 
on T] as well, and, therefore, Po(p(0)A D | j ) should be a good choice for 
approximating C(f6f l \j T ). 

The following makes the above considerations rigorous. First, we formu¬ 
late the additional regularity condition for p. 

Condition 4 (Regularity of p). The density p = dv/dp is constant in 
the R^ 2 -directions, so that we can write 

p( s,t) = p( s) for all s G R^t G R D2 (resp. t G Z ° 2 + ^1). 

Moreover, p satisfies the following regularity condition in the R^ 1 -directions: 
There exist L > 0 and z > 0, such that 

\p(s) — p(0)| < L|s| z for all s G R Dl 
(or for s G [— f° r the ^ one wishes to consider). 

We are now in the position to formulate the theorem. 


Theorem 2.K. Suppose that the prerequisites of Section 1 hold, includ¬ 
ing the Conditions 1, 2, 3 p, as well as the new Condition 4 above. Let t > 0, 
T > 1 (remember that we always assume that T G {n D2 \n G N} if p 2 = LL^ 2 ), 
m := m(T ) G Z + , and h(T ) > 1. Then 

d 2 mef 1 \j T ),Po( P (o)x D \j T )) 


< A(T) + 2 C+Ui+2Z)2)/ 2 _^h__ 


Lt Di 


T 

W (T) l +z/D 1 


( T 1 

( T \ 

\w(T) 1+z / D i ’ 

\w{T)J 


l/D 1 


1 1 

/i(T) 1 /' D i ’ T 1 /^ ’ 


log 1 


log 1 


T 


in(T) 

T 

^(T) 


m 


o 2 


+1 T 


a 


a 


w(T ) ’ w{T) 
2 D (2m + l) D2 

W) 


2°2 \ 
w(T)h(T) ) J 

I) \/Th(T)(3{m)^ 


for T -4 00 , 


where A(T ) := A(T,m,h(T)) is the explicit upper bound that we obtained in 
Theorem 2.1 [formula (2.10) or (2.11) with the corresponding modifications] 
and td 1 = n Dl ^ 2 /T(Rf + 1) is the volume of the D\-dimensional unit ball. 
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Corollary 2.L. Under the prerequisites of Corollary 2.J plus Condi¬ 
tion 4, with z > we obtain 

d 2 (C(£,0f 1 \j T ),Uo(p(O)\ D \j T ))^O forT ^ oo, 

hence, if 6 = 1 (z > 0), 

® Po(p(0)A D |j), 

by result (1.3). 

Proof of Theorem 2.K. Once again we can largely adopt the proof 
of Theorem 2.A (or, more precisely, that of Theorem 2.1). This time only the 
estimate for the discretization error d 2 {£{449f l ), £(r/0^ 1 | j )) has to be re¬ 
placed by an appropriate estimate for our new error d 2 (£(H9f 1 ), Po(p(0)A D | j ))■ 

We proceed just as we did in Theorem 2.A. 

Let rj ~ Po(p(0)A £> ) [consequently, also rj'Of 1 ~ Po(p(0)A D )], H" := ]T k l^/CCkiMaid 
and split up the error as 

d 2 (£(m^),Po{p(o)x D \j T )) 

= d 2 (£(H9f 1 ),C(r ] , 9f 1 \j~ T )) 

<d 2 (£(H^ 1 ),£(H ,/ ^ 1 )) + d 2 (£(H // ^ 1 ),£(r / / ^ 1 |j T )). 

Inequality (2.7) (or, more precisely, the corresponding modification from the 
proof of Theorem 2.1) yields for the second summand, as before, 

(2.12) d^%\C^P\ ST )) < I((_y 1/D '-^p_ + 0 2 

For the first summand we get, by the same method as in (2.6), 

< X]dTv(Po(pki) 1 Po(p(0)A D (C' k i))) 

(2.13) tt 

< ^M^kl) _ Pkl) + l^(^kl) -p(0)A' D (C' k i)|, 

k ,l k ,l 

where the first sum was already estimated in (2.6). Its upper bound, to¬ 
gether with the upper bound from (2.12), forms the bound we arrived at for 
d 2 {C{119if l ), C(j]9f l \j ))■ Therefore, all that is left to do is to show that the 
second sum on the right-hand side of (2.13) can be estimated by the claimed 
additional term. This, however, is done very easily: 

El^( C ki)-p(0)A D (C kl )|=^ 

k ,i k ,i 


/ (p(s)-p(0))/x(d(s,t)) 
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< / |p(s)-p(0)|/i(d(s,t)) 

J Jji 


< 2° 2 L ■ T 


<2° 2 D 1 Lt D i -T 


[-(1 /wiT)) 1 / D i ,{l/w{T))D D i) D i 


— 2(z+£>i+2-D2)/2 t _ 

z + Di Dl w(T) 1+z / Dl ' □ 


r 

T 


s\ z \ Dl (ds) 
Z+Dl ~ l dr 


3. Applications. The results of Section 2 can be applied in a number of 
different ways. For example, they yield useful upper bounds for certain the¬ 
oretical statements about Poisson process approximation, such as classical 
thinning and superposition theorems (by projection of the point processes 
involved on the R^ 2 -directions and the R^* 1 -directions, resp.). There are also 
statistical problems where the results of Section 2 can be of help. To obtain 
an idea of what is possible, we look at two examples in more detail: in Sec¬ 
tion 3.1 we consider a fairly general density estimation problem, examined 
by Ellis (1991), and in Section 3.2 we consider a problem of testing for long 
range dependence. 


3.1. Density estimation. First of all, we need a new regularity condition 
for the density p. 

Condition 4' (Regularity of p). The density p = dn/dp is constant in 
the R^ 2 -directions, so that we can write 

p(s,t) =p( s) for all s E R Dl ,t E R D2 (resp. t E lP 2 + ^1). 

Moreover, p satisfies the following regularity condition in the R^ 1 -directions: 

pEC 2 (R Dl ). 

Of course, it is enough if p\z E C 2 {Z ) for a sufficiently large neighborhood 
Z of 0 E R Dl . 


Suppose that Condition 4' holds (along with the usual conditions from 
Section 1), and that we want to estimate the density p at the point 0 E R Cl , 
say. 

By way of illustration, it is convenient to think of the M . Dl -space as the 
“data space” (i.e., the space of possible data points) and the R Da -space as 
the “ascertainment space” [i.e., the space of points at which data is obtained, 
typically by continuous observation over time (R^ 2 = R = time axis) or by 
repetition of experiments (R Da with reference measure p 2 = Hq 2 )]- A n ex¬ 
ample suggested by Ellis (1986, 1991) is the estimation of the rate at which 
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earthquakes above a certain magnitude occur per unit area and unit time 
in a certain region. Here we have D\ = 2 and D 2 = 1, and the points in M 3 
represent the positions and times of the observed earthquakes. 

Among various methods for density estimation, we choose kernel estima¬ 
tion with a data-independent window width, that is, the window width in 
the -directions does not depend directly on the data, but does depend 
on the “observation span” (which in the discrete case corresponds to the 
sample size). For a detailed account of density estimation see Silverman 
(1986). We adapt the usual notation in connection with density estimation 
to the notation we used in Section 2. Thus, 2T 1 /'° 2 is our observation span 
(in D 2 directions), 2/w(T) l ^ Dl is the window width (in D\ directions) and 
our density estimator at the point 0 takes the form 

PS(O) := ^[2 fll ifWT) 1/0l s)(((l(s,t)), 

where the function K is our Kernel, which fulfills the following condition: 

Condition 5 (Shape of K ). The kernel K :R Dl —► R + satisfies: 

(i) K(s)=0 for s ^ [— 1, l) Dl ; 

(ii) Ji|[_ 11 )D 1 is Lipschitz (w.r.t. do restricted to M^* 1 ) with constant 

1{K)] 

(iii) / K (s) ds = 1; 

(iv) fK(s)sds = 0. 

Note that K does not have to be continuous on the boundary of [—1, \) Dl , 
and that it is reasonable to choose a Kernel K that is radially symmetric 
(or at least an even function in each coordinate), in which case Condition 
5(iv) is satisfied. We now write 

/(x) := 2 Dl K (s) • l[_ ljl)D 2 (t) for x := (s,t) <E R Dl x R D2 =R D , 

so that f\j is Lipschitz (w.r.t. do on R D ) with constant 2 Dl l(K)-, by the 
transformation theorem for integrals, we obtain 

P{(°) = 777 / /(x)^ 1 ^)- 
\Jt\ Jm. d 

The way is now clear for the application of Theorem 2.A. Our primary 
goal will be to estimate a probability distance d between the distribution 
of our estimator p%( 0) and the distribution that is concentrated at the true 
value p(0). To do this, we will first estimate d(C(p^( 0)),£(p^(0))) with the 
aid of Theorem 2.A, and then utilize the excellent properties of Poisson 
point processes to obtain an upper bound for d(C(p v (0)), <5 p (o))- The two 
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corresponding results are contained in the following theorems. For the dis¬ 
tance d, we choose the bounded Wasserstein distance, as defined in Section 
1, because the other distances that we have used so far are too strong to be 
useful: dTv(£G%(0)),<5 p (o)) is generally too big, and is even always equal to 
1 whenever 0) is a continuous random variable, because then 

1 >dTvCC(P£(0)),<5p(o)) > \nPd°) =P(°)] - P[P(0) =p(0)]| = 1; 

and for the Wasserstein distance d w (£(pg(0)),£(^(0))), there seem to be 
unsurmountable difficulties in obtaining a useful upper bound in Theorem 
3.A. 

Theorem 3. A. Suppose that the prerequisites of Section 1 hold, includ¬ 
ing the Conditions 1, 2, 3 p, as well as the additional Conditions 4/ and 5. Let 
l > 0, and for T > 1, let m := m(T) £ Z + , h(T) > 1 and also w(T ) = 0(T S *) 
for T —> oo with 5* £ (0, 1). Then 

rfBw(^(^(0)),T(p r) (0))) 

< ^ W{ ^ M + iy 2 (£(fdf 1 \j),£( v df 1 \ J )) + 2 D H(K)5 T (M) 

f 1 1 T / T \m D2 + l T / 2 d2 \ 

~ \h(T) 1 / D C T 1 / og { w (T)J w(T) ' w{T) a \w(T)h{T)J' 

log '(uu) 6 ( 2 < ‘w(r) 1> ' )' 

/or T oo, 

where M := M(T) £ N* with M > 3 v{Jt) arbitrary and 

5 t {M) = e~ v(Jr) , 

v ; M! 

which decays exponentially in M as T tends to infinity. Thus, we obtain the 
same order for the upper bound as in Theorem 2.A 

Remark 3.B. The upper bound given in Theorem 3.A remains true 
for general w(T) = 0(T). However, if w(T) goes to infinity at a rate that 
is too close to T, then M(T ) has to be chosen to grow somewhat faster 
than T/w(T), and then the order of the upper bound is a little worse (by a 
logarithmic factor in T) than the one stated in Theorem 3.A. 

Proof of Theorem 3. A. Let £ ~ £(£), rf ~ C(rf) = Po(i'), and X := 
Pg/(0), Y '=Prf{ 0). Then we have 

d BW (C(p^0)),C(p v (0)))= sup \Eg(X)-Eg(Y)\ 

9SJT BW 
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with 

|EsTO - Ej(y)| 

(3 1 +E(| S (X )(.,))) 

< E(|X - y|i {sV(J)=llVM( ) + v\gei\j) / i/tffV)] 

for every g in J~ bw- For the first summand, we obtain 


e(|x i l 1 {e , 0y 1 (j)=?7'0yi(j)}) 


= E 


\Jt\ 


f f(x)ge T 1 (dx) - [ f(x)rf9 T l ((Jji) 
JR D JR D 




<2 Dl l(K)E( V ' e Tj^ 

the latter inequality by the definition of the di-distance and because f\j is 
Lipschitz. Next we utilize the fact that since r/ / 0^ 1 (J) is Poisson distributed 
with parameter v? := it exceeds a certain bound M := M(T ) £ N* 

with M + 1 > 2 vt only with very small probability. As noted in Barbour, 
Holst and Janson (1992), Proposition A.2.3, the relation 

M 4- 1 u M 

P[Po(^ T ) > M\ < jt - t -—F[Po(vt) =M]< 2-^e"^ 

holds, and, thus, 

E ( ? ^Jr| J) dl( ^ 1|j,7?/ ^ 1|j) ) 


+ E 


VflrV) 

\Jt\ 




< T^{d^e^\j,r{e^\j)) + |^|PfoV(J) > M] 

where we use the notation 

v M 

s t (m) = 2K m e ~ UT - 

Furthermore, for M > 3 vt, the DeMoivre-Stirling formula gives 
St(M) < const• e AI ~ VT < const-f -J e~ UT . 
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The second summand from (3.1) is estimated as 

1 (/ V ^T 1 ('^)] = E[di(^ / 0 T 1 |j,77 / 0 T 1 |j)l{^ /0 - 1 ( <7 )_ i i ?7/e - 1 (j)}] 
<Kdi(t; 9 T \j,rjO T |j). 

Hence, we obtain altogether in (3.1), 

\Eg(X)-Eg(Y)\ 

< ( ^ + l^^l^Vl,)) + 2 d H(K)6 t (M) 

for every g G .Fbw and every pair of random variables ff, rf with ff rs_/ ^(0, 
r/ ~ £( 77 ). Forming the infimum over ff and rf yields on the right-hand 
side the ^-distance (9t is bijective), and forming the supremum over g on 
the left-hand side, the bounded Wasserstein distance. Thus, we obtain the 
statement. □ 

The second result that was discussed above is contained in the next the¬ 
orem. We write || • H 2 for the T^-norm with respect to the Lebesgue measure 
on Hl Dl . 

Theorem 3.C. Suppose that the prerequisites of Section 1 hold, includ¬ 
ing the Conditions 1, 2, 3 p, as well as the additional Conditions 4/ and 5. Let 
l > 0, and for T > 1, let m := m(T) G Z + , h(T) > 1 and also w(T ) = 0(T S ) 
for T — ► 00 with 5* G (0, 1). Then 

dBw(£0%(O)),<5 p (o)) 

< dBw(£G%(0)) 5 £(j5r?(0))) 

\w(t) l' ( 1 \ 

+ V ^ l|En|2 V ~T~~ + w(T) 2 / Dl + °{w(T)V^) 

-c(xFW 1 1 _— 

\V r ’ w(T) 2 / Dl ’ ^(T) 1 / 15 ! ’ Ti/ D2 ’ 

HLaf 2 ° 2 i 

6 U(T)J w(T) 'w<T ) \w(T)h(T)) ’ 

logt (/yi) a ( 2 ^(T) 11 2 )' 

/or T —> 00 , 

where L' is a nonnegative constant (depending on p and K); if K possesses 
certain symmetry properties (especially if K is radially symmetric), we can 
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write 

L':=±Ap(0) J slK(s)\ Dl (ds), 
where A denotes the D\-dimensional Laplace operator. 

Proof. Due to Theorem 3.A we only have to estimate ^Bw(>C(p r? (0)), S p (o)) 
for 77 ~ Po(i'). We decompose this distance as 

dBw(^(Pr;(0)),5 p (o)) < dsw (£(Pr/(0)), ^Ep v (0)) + 4w (^Ep v (0), <$p(0)) 

<E|p„( 0 ) - Ep„(0)| + |E^(0) -p( 0 )| 

< sd(p 7 ? ( 0 )) + bias(p^( 0 )). 


For the standard deviation we obtain 



where the second and third steps are applications of Campbell’s theorem 
for the variance of an integral w.r.t. a Poisson point process [see King- 
man (1993)] and Fubini’s theorem, respectively [note that (A ^ 1 <g> = 

< 8 > H 2 (T 1 / l>2 Id 2 ), where Ip 2 :M D2 —► R D2 is the identity]. An appli¬ 
cation of Campbell’s theorem for the expectation [see Kingman (1993)] and 
Fubini’s theorem again then yields 

E ^(0) = TJ~\ f n /(xH^dx) 

\Jt\ 

= w\ (wi l 2D ' K ^M^ s y D,{ds> ) 

x p 2 ([-T 1 /D 2 ,T^ D2 ) D2 ) 

= L Kis K^FF^ s ) xD ' (d3y 
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Thus, we obtain for the bias 
3i7(0)-p(0)| 

K(s)(p 


'[- i,i) D i 


w{Ty/ D i 


— p( 0) ) \ Dl (di 


< 


1 




-<9p(0)s X Dl (ds) 


+ 


+ 


K( s) 


1 


/[-i,i) D i 2w(T) 2 / Dl 

K( s) 


<9 2 p(0)(s,s)A Dl (ds) 


[-i,i)«i 2 w(T) 2 / D i 


x max 

0</i<l 


^ ^(rjv-Di s ) -^(°) |s| 2 a D i (* 


by Taylor’s approximation, where || ■ || is the standard norm for bilinear 
forms on JR- 01 . Of the last three summands, the first is always zero because of 
Condition 5(iv), the second can be estimated by L' ^ with a constant 

L' , which for “nice” Kernels (e.g., if K is radially symmetric) can be written 
as 

L' = iAp(0) J s?K'(s)A Dl (rfs), 

and the third is of order o( ui ^ 2 /d 1 ) because of the continuity of d 2 p at 0. 
Thus, 

bias(p„(0))<L'-p^+o(-j^). n 

Once more we formulate the conditions under which the upper bound 
goes to zero. 

Corollary 3.D (Convergence to zero in Theorem 3.C). Suppose that 
the prerequisites of Theorem 3.C hold. Furthermore, suppose that w(T) > 
kT s for k>0, 5 £ ( 0,1) and that 

a(v) = 0(v r ) for v —> 0 with r > 0, 

(d(u) = O ^ (i+s)£> 2 /° ) f or u ~* 00 1 + s > max 

Then 

dBw(A2%(O)),<5 p(o) )->0 for T oo, 


n-Sl + r 1\ 

w ~ , s)’ 
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and, therefore, since the 4w -distance metrizes convergence in distribution 
[see Dudley (1989), Theorem 11.3.3] and since 5 p ^ is the distribution of a 
constant, we obtain 

j5 c (°)-^p(°) for T —> oo, 

that is, the consistency of the estimator p^( 0). 

Remark 3.E. The consistency of p^( 0) was already obtained as a con¬ 
sequence of Theorem 2.5 in Ellis (1991) under conditions that were similar, 
but for the most part somewhat more general. So Corollary 3.D is not so 
much a new result, but rather a crosscheck on the suitability of the explicit 
upper bound obtained in Corollary 3.C. 

Proof. Let M := [3^(Jt)] in Theorem 3.A. We then get immediately 
by applying Theorems 3.C and 3.A and Corollary 2.B that dBw(£(l%(0))) <5 p (o)) 
converges to zero. □ 

3.2. Testing for long range dependence. Suppose £ is a stationary point 
process on with expectation measure v = t - X D (£ known or estimated) 
which satisfies the conditions of Section 1, except for Condition 3. We would 
like to test from a single realization of £ if there is important long range de¬ 
pendence in the -directions or not (our null hypothesis). “No important 
long range dependence” means here that Condition 3x is satisfied for given 
x 6 {/3,p,(p} and /3, corresponding to the minimal mixing rate one wants to 
test for. For the sake of illustration, think of the M' Dl -direction(s) as time 
and the -directions as space. Imagine that for fixed T > 1, the points 
of £ in Jt denote the times and locations of incidences of a certain rare 
disease, which is observed in a large area (e.g., a country or a continent) 
over a relatively short period of time (e.g., some months or a year). 

Under the null hypothesis, by Theorem 2.A, respectively, Theorem 2.D, 
the distribution of £0y 1 |j will be close to the distribution of r)9f \j, which 
here is just the homogeneous Poisson process on J with intensity (T /w(T)) ■ 
t. There are various reasonable statistics for testing the hypothesis of “com¬ 
plete spatial randomness” in point patterns; one such statistic, U :M P —> M, 
is the average nearest neighbor distance in the data, which can be shown 
to be Lipschitz continuous with respect to the di-distance with a Lipschitz 
constant that we denote by Lp. 

We wish to find an approximate critical value t a for, say, a one-sided test 
of size a of the null hypothesis against an aggregated alternative (i.e., the 
alternative that there is a certain amount of “long range” clustering), using 
the statistic U, where U(p) '■= U(p9f l \j) for every point measure p on M. D . 
To do so, fix K > 0 and choose t a so that 

Vft a , K (U(ri)) + KL D -e = a, 
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where e is our upper bound for d 2 {JC.(^ 6 ^ 1 \j),C(rjd^ 1 \j)), and 

'1, if x < t. 

t / \ 1 — K(x — t ), if t < x < t + —, 

0, if x>t + — , 

A 

is a iC-Lipschitz approximation of the indicator l(_oo,t]- This yields 
0 < a — P[E/(£) < t a ] 

< IE ft a ,K(U(v)) - + ZKL D ■ e. 

Thus, if e is very small (i.e., the conditions for Theorem 2. A, resp. Theorem 
2.D, are strong enough), a large K can be chosen, and, consequently, we can 
adjust the size of our test to be only slightly below a. 

It should be noted that the distribution of U(rj) is not known, but it 
can be simulated very easily. Also, there are good normal approximations of 
C(U ( 77)1 \rj\ = N) for N not too small which can be of use. See Ripley [(1981), 
Section 8.2] for further details. 

APPENDIX: LOCAL STEIN THEOREMS 

The central results of this article are achieved by applying estimates that 
were obtained in one or another form by Stein’s method. Since it is far 
beyond the scope of this article to summarize in detail the classical Stein- 
Chen method (Stein’s method for the approximation of a sum of indicator 
random variables by a Poisson random variable) or what in this article is 
sometimes called the “generalized Stein-Chen method” (Stein’s method for 
the approximation of an indicator point process by a discrete Poisson point 
process), we only present very briefly the required results. The proofs of 
these results and the method behind them, as well as a wealth of related 
material, can be found in Barbour, Holst and Janson (1992). 

Let T be any finite nonempty index set and (Ij), g r a sequence of indicator 
random variables with a local dependence property, that is, for every i £ T, 

the set Lj := T \ {*} can be partitioned as r, = Tf UT* into a set Lf of 
indices j, for which Ij depends “strongly” on /*, and a set TU of indices 
j, for which Ij depends “weakly” on I{. Herein, the terms “strongly” and 
“weakly” are not meant as a restriction to the partition of Tj, but serve only 
illustrative purposes. The same holds true for the term “local dependence,” 
which does not have to possess any representation in the spatial structure 
of T (in our applications in Section 2 it always does, though). We now write 
Zi := J2jer s Iji T) := Sjer™ Ij j Pi := > 0 (w.l.o.g.) for every * e T and set 

W := Y^ievlii ^ : =IEIT = XugrPi- Furthermore, we choose arbitrary points 
(«i)i g r in any desired complete, separable metric space (A, do) with do<l 
and set 5 := EjgrTAa,;- 
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A.l. Poisson approximation of the distribution of the sum W of indica¬ 
tors. By applying the classical Stein-Chen method [see Chen (1975)] the 
following result is obtained. 

Theorem A. A (Local Stein-Chen theorem for sums of indicators). With 
the above definitions, we have 

d TV (C(W), Po(A)) 

< min^l,ij ^(pf + Pi^-Zi + E(/jZi)) + min^l, -^0 

where 

ei =E\E(Ii\(Ij :j £ r“)) - p t \ = 2 max | cov(Jj, 1 B )\- 

Be<r{Ij : j£Tf) 

Proof. See Barbour, Holst and Janson (1992), Theorem l.A. □ 

Remark A.B. The order of the upper bound in Theorem A.A cannot 
generally be improved. See Barbour, Holst and Janson (1992), Chapter 3. 

The Stein-Chen method is by no means restricted to approximating sums 
of indicator random variables. For instance, as far as Z + -valued random 
variables are concerned, one might also consider the case where W is itself 
Poisson distributed with some parameter p, > 0. 

Proposition A.C. Let A,p>0. Then 

dTv(Po(A),Po(/i)) <nhn(\,-^,-^ • |A-p|. 

Proof. This proposition is a special case of Barbour, Holst and Janson 
(1992), Theorem l.C(i). However, the result can be obtained very easily by 
direct calculation, using the Stein-Chen method. □ 


A.2. Poisson process approximation of the distribution of the indicator 
point process E. By applying a natural generalization of the Stein-Chen 
method as in Barbour and Brown (1992), the following result is obtained. 


Theorem A.D (Local Stein theorem for indicator point processes). With 
the above definitions and tv := we have 

d 2 (£(E),Po(7r)) 

<{lA|(l + 21og+Q^}g(p?+p i EZ i + E(/ i Z i )) 

1 A 1-65-^=") 

v A/ j gr 
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where 


ei = E\E(Ii\(Ij-,j eT™)) -pi\=2 max | cov(A, t B )\- 
w Bea(ijjerv’) 

Proof. See Barbour, Holst and Janson (1992), Theorem 10.F. □ 

Remark A.E. Note that the upper bound in Theorem A.D depends 
neither on the points ctj, i G T, nor on the specific choice of the metric do, 
as long as it is bounded by 1. 

Acknowledgment. The author wishes to thank Andrew Barbour for con¬ 
tributing many helpful comments and suggestions to this article. 
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